Identification of neurological complications in childhood influenza: a random forest model

Background Among the neurological complications of influenza in children, the most severe is acute necrotizing encephalopathy (ANE), with a high mortality rate and neurological sequelae. ANE is characterized by rapid progression to death within 1–2 days from onset. However, the knowledge about the early diagnosis of ANE is limited, which is often misdiagnosed as simple seizures/convulsions or mild acute influenza-associated encephalopathy (IAE). Objective To develop and validate an early prediction model to discriminate the ANE from two common neurological complications, seizures/convulsions and mild IAE in children with influenza. Methods This retrospective case-control study included patients with ANE (median age 3.8 (2.3,5.4) years), seizures/convulsions alone (median age 2.6 (1.7,4.3) years), or mild IAE (median age 2.8 (1.5,6.1) years) at a tertiary pediatric medical center in China between November 2012 to January 2020. The random forest algorithm was used to screen the characteristics and construct a prediction model. Results Of the 433 patients, 278 (64.2%) had seizures/convulsions alone, 106 (24.5%) had mild IAE, and 49 (11.3%) had ANE. The discrimination performance of the model was satisfactory, with an accuracy above 0.80 from both model development (84.2%) and internal validation (88.2%). Seizures/convulsions were less likely to be wrongly classified (3.7%, 2/54), but mild IAE (22.7%, 5/22) was prone to be misdiagnosed as seizures/convulsions, and a small proportion (4.5%, 1/22) of them was prone to be misdiagnosed as ANE. Of the children with ANE, 22.2% (2/9) were misdiagnosed as mild IAE, and none were misdiagnosed as seizures/convulsions. Conclusion This model can distinguish the ANE from seizures/convulsions with high accuracy and from mild IAE close to 80% accuracy, providing valuable information for the early management of children with influenza. Supplementary Information The online version contains supplementary material available at 10.1186/s12887-024-04773-4.


Introduction
Neurological complications of influenza are uncommon, but permanent sequel and death are not rare [1].Seizures or febrile convulsions and influenza-associated encephalopathy (IAE) are two of the most frequently reported neurological complications and the common cause of hospital admissions among children with influenza.During the hospitalization, some children may have seizures/ convulsions alone or mild IAE, but some children may quickly develop into the most severe category of IAE, acute necrotizing encephalopathy (ANE), with a high frequency of neurologic sequelae (33-50%) and mortality rate (around 30%) [2][3][4].The majority of the patients with IAE falls in the age of 1 to 5-year-old [5], while ANE typically occurs in children < 5 years of age and is characterized by rapid progression to encephalopathy, coma, or death within 1-2 days from onset [6][7][8].Therefore, early diagnosis and intervention for ANE are crucial.
ANE is defined as acute fever, frequent convulsions, acute disturbance of consciousness even coma, and multiple organ failure, with a risk of death [6][7][8]; biochemistry changes are not specific [4], but imaging shows brain edema and necrosis of thalamus and other deep brain structures [4,9,10].Generally, contrast-enhanced computed tomography (CT) can detect ring-shaped enhancement of the thalamus and deep brain white matter 3 days after illness onset [11].However, irregular high-density shadows in the hypothalamic mottled low-density area appeared until 7 days after illness onset, while no abnormal lesions were found in patients who died within 30 h.Similarly, significant gray matter damage volume changes can be observed using conventional magnetic resonance imaging (MRI) [12].Three days after onset, the thalamus display a concentric-ring pattern in the T1-weighted image (T1WI).In the second week, the T1WI reveals ring-shaped increased signal intensity in the thalamus.Moreover, diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC) map show a concentric pattern in the acute phase of typical cases [13].These imaging studies showed that there might be no abnormalities found on brain CT or MRI in the early stage of ANE, which is one of the reasons for the low early diagnosis rate of ANE.When the typical ANE brain imaging is found, the patient may miss the opportunity for early intervention and progress to death or severe sequelae.Therefore, finding clinical indicators for early identification of ANE from patients with influenza neurological complications is necessary.However, the knowledge about the reliable and early diagnosis of ANE is limited.The ANE can be controlled, and the prognosis can be improved if early treatment is undertaken, as Low brain temperature, antiviral medication, immunoglobulin, glucocorticoids, and plasma exchange [4].Before rapid antigen tests for influenza widely available, antiviral treatment was usually given until the positive results of nucleic acid PCR return (probably 24 h after sampling in our medical center).Many ANE patients rapidly deteriorated into coma or other consciousness disorders before positive results or antiviral treatment taken.Therefore, Indicators that can be quickly obtained in the emergency department, such as detailed demographic, clinical characteristics, biochemistry and hematologic indicators in serum samples, may have clinical significance for early diagnosis of ANE.Compared with seizures/convulsions alone or mild IAE, ANE is a relatively rare entity that can masquerade from febrile convulsions or other neurological complications in the early phase [14].Most recent studies related to the clinical characteristics of ANE were limited by small sample sizes or based on case reports/ series, and only recruited ANE patients [3,[15][16][17][18].To date, there is a lack of studies for discriminating the ANE from two common neurological complications, seizures/ convulsions and IAE, in children with influenza.Random forest (RF) is a robust and commonly-used machine learning algorithm that can be used for disease diagnosis and classification, and is good at describing the relationship between independent and dependent variables with high flexibility and sufficient accuracy [19].Therefore, the objective of our study was to develop and validate an early prediction model using a random forest model to distinguish ANE from seizures/convulsions alone and mild IAE in children with influenza.

Study design and patients
To develop the multivariable diagnostic model, we designed and implemented a retrospective case-control study.This study was approved by the Ethics Committee of Guangzhou Women and Children Medical Center ([2019]38,201).All patients signed an informed consent form upon admission.Reporting of this study has followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement.
In Guangzhou Women and Children's Medical Center in Guangzhou (GWCMC) that provided tertiary care, patients hospitalized with influenza virus infection and had neurological manifestations at hospitalization between November 2012 to January 2020 were enrolled.The inclusion criteria were: (1) age < 18 years; (2) laboratory confirmation of influenza; and (3) neurological manifestations, such as convulsion, acute cognitive impairment, acute disturbance of consciousness, coma, and abnormalities in the cerebrospinal fluid examination or head imaging.Patients were considered as ineligible if they met the following exclusion criteria: (1) admission > 7 days after onset; (2) co-infected with other pathogens; (3) comorbidities like brain trauma, sequelae of viral encephalitis, or metabolic diseases; (4) missing data > 30%; or (5) neurological complications other than ANE, seizures/convulsions, and mild IAE.

Outcome to be predicted and reference standard
There were three outcomes for all subjects met the inclusion criteria: seizure, mild IAE and ANE.Seizures/convulsions related to influenza was defined as convulsive seizures during fever, consciousness after the convulsion, a maximum of two seizures/convulsions events, and no abnormalities in the cerebrospinal fluid (CSF) examination and brain imaging, if done.The type of seizures/ convulsions is a generalized tonic-clonic seizure, including very few patients with previously diagnosed febrile seizures presenting as atonic and binocular staring.As the pathogenesis of IAE and ANE is not fully understood, there is an overlap between the symptoms of the two disorders [20][21][22].In order to avoid confusion, we defined the IAE patients matching the following criteria as mild IAE enrolled in the study, that presented short term convulsions(< 5 min), coma lasting more than 24 h [6,23], but recovered completely without abnormal results in CSF and neuroimaging.ANE was defined as acute fever, frequent convulsions(≥ 3 times), coma, multiple organ failure, even death [2,7,8]; imaging shows brain edema and necrosis of the thalamus and other deep brain structures [4,9,10].

Candidate predictors
Detailed demographic, clinical characteristics at admission, and biochemistry and hematologic indicators in serum sample of the patients were extracted from the structured electronic medical records system (EMRS) (Table 1).In addition, the first blood sample for measurement of hematologic indicators was obtained from patients immediately after admission.

Prediction model development
The prediction model was developed based on an RF classification algorithm, an ensemble of decision trees, known for its robustness in handling outliers and noise which is crucial in clinical datasets like ours [19,25,26].The RF approach was preferred over other methods such as GBT due to its inherent resistance to overfitting, particularly important given the high number of predictors in our dataset.Furthermore, the ability of RF to provide a direct measure of variable importance and its internal estimation of generalization error through the out-ofbag error estimate were pivotal reasons for its selection.These features make RF particularly suitable for our study where model interpretability and robustness are essential.The two main parameters in RF, mtry (the number of random variables used in each tree) and ntree (the number of trees used in the forest), were set to the square root of the number of predictors and 500, respectively, to optimize the balance between model accuracy and computational efficiency.The missing values were addressed by median imputation for each variable to minimize bias.The proportions of missing values in variables are presented in Supplementary Material, Table S1.
The dataset was randomly split into two separate data sets using 5-fold cross-validation on the RF method: 80% for the training set to build a fitted model and the remaining 20% for the validation set to obtain unbiased estimates of correct classification rates and variable importance.As the equation is shown below, the correct classification rate (accuracy) was the number of observations that had been correctly classified divided by the sample size.

Statistical analysis
Data were expressed as median (interquartile range (IQR)) for non-normally distributed variables and number (percentage) for categorical variables.The normality of the data distribution was examined by using the Shapiro-Wilk tests.Baseline characteristics were compared between patients with seizures/convulsions, IAE and ANE using the Kruskal-Wallis test and Chi-Square/Fisher's exact test to detect any differences in the continuous and categorical variables.The clinical and laboratory data were compared between the training and validation set using Mann-Whitney U-Test and Chi-Square/Fisher's exact test.A two-sided P-value of < 0.05 was regarded as statistically significant.Data management and statistical analyses were conducted using SAS (version 9.4, SAS Institute Inc.) and R software (version 3.2.5,R Project for Statistical Computing).

Characteristics of the patients
A total of 433 patients had met eligibility criteria and enrolled (Fig. 1).The median age of all patients was 2.8 (IQR 1.7-6.1)years, and the majority were male (n = 294, 67.89%).Among them, 278 (64.2%) were ultimately  a Four patients with a history of febrile seizures presented with transient atonic and binocular staring rather than generalized tonic clonic b There were 17 cases presented in complex febrile seizures and febrile status epilepticus (FSE), whose consciousness without fully recovering between seizures [24].

Variable selection and model development
Variable selection was carried out using the different subsets of features.The top 15 variables selected in order of their importance are shown in Fig. 3.The higher importance values indicate that the variable has more impact on predictions.Figure 4 shows the relationship between the cross-validation error and the number of variables.The error dropped rapidly at the beginning and then increased gradually with the number of variables.When the number of variables was 10, the minimum error of 0.16 was achieved.Thus, we included 10 features in the model, including convulsions, procalcitonin (PCT), urea, γ-glutamyl transferase (γ-GT), aspartate aminotransferase (AST), albumin/globulin ratio (A/G), α-hydroxybutyric dehydrogenase (α-HBD), alanine aminotransferase (ALT), alkaline phosphatase (ALP), and C-reactive protein (CRP).

Variable influence
The independent influences of 10 variables mentioned above on the seizure were calculated by the random forest (Fig. 5).Each variable's effect, as depicted on the vertical axis, quantifies the change in the outcome predictive accuracy when that specific variable's value is modified within the model, while keeping other factors constant.The horizontal axis represents the specific levels or values of each variable, which allows us to observe how changes in each variable's levels are associated with changes in their predictive influence, known as the variable effect.For instance, a low number of convulsions (1-2) at admission generally implies a straightforward diagnosis of seizures without further complications, reflected by a higher variable effect score.Conversely, a higher number of convulsions (≥ 3 or 0) may suggest a more complex clinical scenario such as mild IAE or

Model performance and validation
The prediction model gave a prediction accuracy of 84.2%.In order to obtain unbiased estimates of accuracy, the model was internally validated using 5-fold cross-validation.The 10 variables included in the model and the outcomes were compared between the training and validation sets (Table 2), no significant intergroup differences were observed.When applied to the held-out validation set, the prediction accuracy was 88.2%, indicating the good discriminatory performance of the model.
In the validation set, seizures/convulsions were less likely to be wrongly classified (3.7%, 2/54), but mild IAE (22.7%, 5/22) was prone to be misdiagnosed as seizures/ convulsions, and a small proportion (4.5%, 1/22) of them was prone to be misdiagnosed as ANE (Table 3).Of the children with ANE, 22.2% (2/9) were misdiagnosed as mild IAE, and none were misdiagnosed as seizures.Furthermore, the accuracy of classifying seizure from the other two classes, mild IAE from the other two classes, and INE from the other two classes are 0.95, 0.92, and 0.90, respectively (Fig. 6).This suggests that the model   performs better when only used to distinguish between two classes.

Discussion
Neurological complications caused by influenza are serious conditions, mainly occurring in young children and with high morbidity and mortality rates [27].Our study developed and internally validated a diagnostic model for distinguishing ANE from seizures/convulsions alone and mild IAE in children with influenza.The first measurements of biochemical and hematologic indicators on admission were evaluated using the RF method, avoiding the problem of model overfitting caused by correlations among the variables.The discrimination performance of the model was satisfactory, with an accuracy above 0.80 from both model development and internal validation.Seizures/convulsions were less likely to be wrongly classified, but mild IAE was prone to be misdiagnosed as seizures/convulsions.Of the children with ANE, around 20% were misdiagnosed as mild IAE, and none were misdiagnosed as seizures/convulsions.Our model, including only 10 common variables, was convenient for clinicians to perform early diagnosis and intervention.
In concordance with the result of previous studies, we found that convulsions, PCT, urea, γ-GT, AST, A/G, α-HBD, ALT, ALP, and CRP were important predictors of ANE.Indeed, ANE is characterized by frequent convulsions [7,8,28], as observed in the present study.A previous study showed that a combination of age < 4 years, repeated seizures, altered consciousness, and positive Babinski's sign were the high-risk factors for ANE [28].On the other hand, the literature suggests no specific laboratory marker for diagnostic of ANE [29][30][31], but elevated serum transaminases could be associated with ANE [7,30].In the present study, the likelihood of seizures/convulsions outcome decreased with the increasing levels of AST, indicated that the risk of IAE or ANE increased with the increase of AST.Studies showed that  AST levels increase when soft-tissue necrosis occurs [32][33][34].Early research showed that brain dehydration was maximal 30 min after urea injection and improved cerebral circulation [35]; the increase in urea might be related to the reactive regulation of early cerebral edema.The endothelial cells of the capillaries of the cerebral cortex in rats showed high γ-GT activity [36], suggesting the possibility of cerebrovascular involvement in early ANE.Activities of α-HBDH were measured in rats after intermittent exposure to aerogenic hypoxia but had no effects on adults [37] and were also associated with edema, ischemic and hemorrhagic changes [38].In the early stages of influenza, increases in these factors could be a high risk for ANE.
In the present study, the imaging parameters were not included in the analyses, mainly because of the broad examination types and protocols.Further studies should consider the possibility of adding imaging variables to refine the present model.Indeed, the presence of brain imaging features is usually associated with a poor prognosis [39][40][41][42][43]. Furthermore, CSF examination in patients with ANE usually reveals increased amounts of proteins [30].However, the cerebrospinal fluid examination was not performed in all patients.In the present study, older children were more prone to have ANE.The importance of age was ranked 12th, but only the first 10 variables were included in the model based on the cross-validation error minimum principle.The strain or the subtypes of influenza may play an important role in ANE development.However, detection of the strain or the subtypes of influenza may took a long time, which was not routinely tested at our medical center, so it was not included in the forest model.Compared with ANE, the discrimination ability of the model was lower for mild IAE.It could be because mild IAE is an intermediate condition between seizures/convulsions and ANE based on the symptoms [4].Indeed, IAE is characterized by convulsions and coma [6,7,30], and death is not common [4].Cadaver studies have shown cerebral vascular damage was found in patients with severe IAE, but necrotic changes were not seen [44,45].Cerebrovascular involvement could be found in both severe IAE and ANE, that may be the reason of overlap between two of them.In the early stage of disease, the above 10 variables in the forest model can be used to predict severe IAE or ANE, and indicate clinicians to take special examinations evaluating the severity and prognosis of disease, such as thromboelastography [46], indicators of brain tissue necrosis (such as lactate dehydrogenase and malondialdehyde levels in CSF) [47], and a special sequence of brain MRI (e.g., thalamic proton magnetic resonance spectroscopy (MRS) measures [48], DWI and MR angiography).The inspections mentioned above are not routine examinations in the diagnosis of emergency department patients.Futhermore, once the prediction model indicates severe IAE or ANE, clinicians can carry out early intervention such as low brain temperature, antiviral medication, immunoglobulin, glucocorticoids, and plasma exchange, etc [4], as soon as possible to improve the prognosis, and provide objective evidence for communication with the family and obtaining treatment approval.
This study has some limitations.The correct classification rate of seizures is high in our model, but potential misclassification between mild IAE and ANE can not be ignored.The ability of differentiation might be improved by adding imaging characteristics and CSF parameters into the model.Furthermore, as the pathogenesis of IAE and IANE is not fully understood, there is overlap between the two in diagnosis.Therefore, to avoid confusion, this study only recruited the IAE patients without brain imaging abnormalities (defined as the mild IAE group) and the ANE patients (clinical and imaging conforming to the ANE diagnosis), aiming to predict the trend of ANE through the earliest clinical information.In addition, only the first detection values of biochemical and hematological indicators on admission were considered, and the eventual changes in those indicators were not evaluated.Since ANE is a rapidly progressing condition, the exact timing of the evaluations may affect the results.Despite these limitations, we believe the conclusion of the study would not be overturned.

Conclusion
In the present study, 10 high-risk factors were selected as variables from clinical characteristics and serological indicators, including convulsions, PCT, urea, γ-GT, AST, etc., for developing a prediction model, which could accurately distinguish ANE from seizures/convulsions.This practical diagnostic tool is convenient and provides valuable information for clinicians in choosing early interventions or planning further examination.Nevertheless, the diagnostic accuracy for differentiating IAE from ANE, and differentiating mild IAE from seizures/convulsions, still needs to be improved.Further researches are needed to refine this model.

Abbreviations
Correct classification rate =True Seizures + True IAE + True ANE Number of patients in the data set

Fig. 6
Fig. 6 ROC curves on random forest model in validation set.A, ROC curve of classifying seizure from the other two classes.B, ROC curve of classifying mild IAE from the other two classes.C, ROC curve of classifying INE from the other two classes

Table 1
Demographic, clinical, and laboratory data of the patients

Table 3
Confusion matrix of the training and validation sets IAE Influenza virus-associated encephalitis, ANE Acute necrotizing encephalopathy